Julia Schulte-Cloos
January 25, 2023
🙌 Benefits yourself! 🙌
‘Create a better relationship with your future self’
🚀 That’s the future of the social sciences! 🚀
Journals and funding agency requirements, see e.g. the Sherpa Romeo Database or the Plan S Journal Checker Tool
Open science and data sharing
Confidence in your own work, easier collaboration and smoother workflows
Replicability refers to situations in which a researcher obtains new data to reach the same scientific conclusions as a previous study, whereas reproducibility refers to situations in which the original researcher’s software, code, and data are used to regenerate the results.
✅ Replication standards: guidelines, protocols, and software designed to help researchers share, analyze, archive, preserve, distribute, catalog, translate, verify, and replicate scholarly research data and analyses across disciplines. Includes proposals to improve the norms around data sharing and replication in scientific research.
Integrate computer code with software documentation in a single document
read.csv('./data/foo.csv')sessionInfo()figs, data, etc.)- ./data
+ `raw_data.csv`
+ `tidy_data.csv`
+ `codebook.txt`
- ./analysis
- ./figures
+ ./interaction_plot.png
+ ./bar_plot.png
- ./paper
- ./presentation
- ./README.md
Markdown as a human readable way to style text
“Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).” John Gruber, founder of Markdown
R and RStudio (not the single IDE that supports RMarkdown, Visual Studio is also a great choice)
RMarkdown integrates R code into Markdown language through knitr
Quarto: extension of RMarkdown, optimised for language interoperability & CLI
**bold text** or equivalently __bold__*italic* or equivalently _italic_# A level-one section
## A level-two section with a [link](/url)
# An unnumbered section {-}, or equivalently # An unnumbered section {.unnumbered}
{#sec:introduction}# Reproducible research outputs → {#reproducible-research-outputs}.Bullet list
Numbered lists
^[footnote]…mostly a matter of taste 🍷🍺
Control how code and its products appear in your compiled report or manuscript. Code chunks are required to have unique names, e.g. {r data2017-tidy}
Define conditions under which the code is evaluated and how its output is processed within the document. Most frequent options include: eval, include, results, echo. Comprehensive list online, in the RMarkdown reference guide, and for Quarto. Most IDEs allow you to easily switch between different chunks.
→ old-school way to specify chunk options
```{r elephant-chunk-1, out.width="20%", fig.align="center", fig.cap="Elephant in the room", echo="fenced"}
knitr::include_graphics(path = "figs/elephant.jpg")
```→ more recently, chunk options can be specified in a YAML-style within the actual code chunk, for better readability
The slope of the regression is 3.93.
output, title, author, date---
title: "Writing a reproducible research paper"
author: "Julia Schulte-Cloos"
date: 2023-01-25
format:
pdf:
execute:
echo: false
---
In doubt about YAML validity? Use an available YAML linter.
You can render your document by relying on globally specified parameter (YAML header) that will affect how your code is evaluated, e.g. by focussing only on a subset of your data.
---
title: "My Document"
params:
alpha: 0.1
ratio: 0.1
---
10:00
quarto---
format:
pdf:
toc: true
---
indent: true in the YAML headergeometry option in the YAML headerInclude your literature.bib file in the YAML header (YAML key: bibliography:) Cite any entry as recorded in the .bib-file by calling @palmerdata.2020 for inline citations and [@palmerdata.2020, p.10] for all other references.
If a csl style is specified, Pandoc converts Markdown references, i.e., @palmerdata.2020, to ‘hardcoded’ text and to a hyperlink to reference section in your document.
If your document specifies a citation reference package like biblatex or natbib along with the related options, pandoc will create the corresponding LaTeX commands (e.g. \autocite, or \pcite) to create the references from Markdown references (not recommended because you are not flexible regarding output formats!)
Cross-reference sections, figures, tables or equations: e.g., @fig-elephant.
With colorlinks: true option in the YAML header, hyperlinks are colored
If you do not specify a section label, Pandoc will automatically assign a label based on the title of your header. For more details, see the Pandoc manual. If you wish to add a manual label to a header, add {#mylabel} to the end of the section header.
{#fig-elephant}
⚠️ Quarto uses a slightly different syntax to cross-reference figures than RMarkdown: @fig-elephant
Add a dedicated code chunk option #| layout-ncol: 2 to your code chunks to include several figures side by side.
This is very powerful in conjunction with #| fig-subcap: to specify captions for each of the figure.
### References
::: {#refs}
:::
section-bibliograhiesmodelsummary)Integrate two tables (or figures) side-by-side, each with its own sub-caption in your Quarto document
10:00
Advantages? 🤔
Approach 1
#| eval: knitr::is_html_output()Approach 2
execute YAML optionexecute options (no indentation, for any type of format)format specific option (indentation, specific for each format)---
format:
html:
toc: true
code-fold: true
execute:
echo: true
pdf:
toc: false
execute:
echo: false
execute:
warning: false
message: false
---
.content-visible class.content-hidden class::: {.content-hidden unless-format="pdf"}
Will only appear in PDF.
:::
The code chunk option ref.label takes a vector of chunk labels to retrieve the content of the respective chunks.
ref.label can also evaluate R code, e.g. to retrieve the code of all labels within a document (knitr::all_labels()).
# Appendix: All code for this presentation
…or a subset of chunks that are also evaluated when rendering the document:
knitr engine, jupyter can be used---
title: "My Document"
jupyter: python3
---
Quarto offers more control regarding the inclusion of author-related meta-data (names, affiliations, contributions to the work) that is printed as part of the title, in some output formats. See the full documentation
---
author:
- name:
given: Norah
family: Jones
literal: Norah Jones
attributes:
corresponding: true
orcid: 0000-1234-0000-5678
---
quarto install extension pandoc-ext/abstract-sectionAdd some content that should be excluded depending on the output format (HTML blog post vs. PDF manuscript).
Add a filter that allows you to draft your abstract as part of the main text (rather than in the YAML meta data).
10:00
Reproducible reports - Mainz